Recent news media coverage and my own personal interest inspired me to explore a school vaccination data set for WA state. WA DOH employees generously assembled3 data sets for my project for school years 2011-2012, 2012-2013, and 2013-2014. The 2011-2012 set is the most comprehensive in terms of school covered and relevant fields. Each data set contains an observation about a school. The 3 data sets were bound into one data set, then explored.
The initial data and cleaning/assembly of the combined data frame can be found here: githubrepo
Additional cleaning and wrangling can be found here: githubrepo
An initial exploration and some additional corrections for student enrollment, is here: githubrepo
The current state is a rough approach and far from perfect. I plan to revisit this as I discover additional elements that need cleaning. I am currently working on a more readable and organized version of the data wrangling using piping, so that changes are easy to make and push to the .rds file called as data for this project.
There are some aspects of vaccination and the spread of disease through populations that are best left to subject matter experts. I consulted CDC publications throughout this analysis and I encourage the reader to do the same.
For example, what is the vaccinatation rate threshold that should not be exceeded to maintain herd immunity? This is complex question. A typical answer is 10%. There are assumptions made regarding mixing of populations and how fast a contagion spreads. This is even more complex in the context of school data where there is a wide range of populations and students may come from a large geographical area. The latter is particularly true of private schools where students may not necessarily live within a specific geographical region.
In regards to spread of disease, some disease are more contagios than others and have a range of opportunities for exposure. For example, while Hep B is fairly contagious (~ 100 X times HIV), since it is blood-borne, the opportunity for exposure is less than something like the common cold. Measles is particularly contagious with a high exposure opportunity and inforomation available from the CDC suggest that vaccination rates <= 95% pose a risk. Bear is mind that this is factoring in that not all those vaccinated will actually be immune, so even at 100% exemption rate, 100% immunity would not be expected.
For the purposes of this analysis, I set the outbreak risk thresholds at 90% for general vaccination rates and 95% for MMR. That translates to 5% exempt for MMR and 10% exempt general.
## 'data.frame': 6696 obs. of 33 variables:
## $ school_year : Factor w/ 3 levels "2011","2012",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ school_code : chr "1656" "4500" "2834" "3209" ...
## $ school_name : chr "10th Street School" "A G West Black Hills High School" "A J West Elementary" "Abraham Lincoln Elementary" ...
## $ school_type : Factor w/ 2 levels "not_Public","Public": 2 2 2 2 1 2 1 2 2 2 ...
## $ school_city : chr "Marysville" "Tumwater" "Aberdeen" "WENATCHEE" ...
## $ school_state : chr "WA" "WA" "WA" "WA" ...
## $ school_zip : chr "98271" "98512" "98520" "98801" ...
## $ school_county : chr "SNOHOMISH" "THURSTON" "GRAYS HARBOR" "CHELAN" ...
## $ school_district : chr "Marysville School District" "Tumwater School District" "Aberdeen School District" "Wenatchee School District" ...
## $ district_code : num 31025 34033 14005 4246 31016 ...
## $ grade_levels : chr "6-8" "9-12" "K-6" "K-5" ...
## $ reported : Factor w/ 3 levels "","No","Yes": 3 3 3 3 3 3 3 3 3 3 ...
## $ enrolled : num 181 858 450 510 167 325 48 203 205 329 ...
## $ total_exemptions : int 10 44 4 16 42 12 6 9 11 25 ...
## $ medical_exempt : num 0 0 0 1 0 1 2 2 0 2 ...
## $ personal_exempt : int 10 44 4 13 40 11 3 7 11 20 ...
## $ religious_exempt : num 0 0 0 2 2 0 1 0 0 3 ...
## $ exempt_DTTd : int 4 17 2 13 40 5 4 4 9 9 ...
## $ exempt_pertussis : int 5 36 2 15 40 3 4 4 9 9 ...
## $ exempt_polio : int 5 28 3 15 40 3 4 3 10 9 ...
## $ exempt_MMR : int 6 35 2 15 40 4 4 6 9 2 ...
## $ exempt_hepB : int 5 31 2 13 42 8 5 4 11 15 ...
## $ exempt_varicella : int 0 0 2 11 40 0 5 0 6 6 ...
## $ percent_exempt_DTTd : num 2.21 1.98 0.44 2.55 23.95 ...
## $ percent_exempt_pertussis: num 2.76 4.2 0.44 2.94 23.95 ...
## $ percent_exempt_polio : num 2.76 3.26 0.67 2.94 23.95 ...
## $ percent_exempt_MMR : num 3.31 4.08 0.44 2.94 23.95 ...
## $ percent_exempt_hepB : num 2.76 3.61 0.44 2.55 25.15 ...
## $ percent_exempt_varicella: num 0 0 0.44 2.16 23.95 ...
## $ percent_exempt : num 5.52 5.13 0.89 3.14 25.15 ...
## $ nonmedical_exempt : num 10 44 4 15 42 11 4 7 11 23 ...
## $ percent_nonmedexempt : num 5.525 5.128 0.889 2.941 25.15 ...
## $ percent_medical_exempt : num 0 0 0 0.196 0 ...
## 'data.frame': 6696 obs. of 19 variables:
## $ school_year : Factor w/ 3 levels "2011","2012",..: 2 2 2 2 2 2 3 3 3 3 ...
## $ school_code : chr "3706" "2308" "2131" "2757" ...
## $ school_name : chr "Rose Hill Junior High" "Kirkland Junior High" "Wapato Middle School" "Satus Elementary" ...
## $ school_type : Factor w/ 2 levels "not_Public","Public": 2 2 2 2 2 2 2 2 2 2 ...
## $ school_city : chr "Redmond" "Kirkland" "Wapato" "Wapato" ...
## $ school_state : chr "WA" "WA" "WA" "WA" ...
## $ school_zip : chr "98052" "98033" "98951" "98951" ...
## $ school_county : chr "KING" "KING" "YAKIMA" "YAKIMA" ...
## $ school_district : chr "Lake Washington School District" "Lake Washington School District" "Wapato School District" "Wapato School District" ...
## $ district_code : num 17414 17414 39207 39207 39207 ...
## $ enrolled : num 5076 4157 3540 3540 3540 ...
## $ total_exemptions : int 121 171 8 8 8 8 7 7 7 7 ...
## $ medical_exempt : num 15 13 5 5 5 5 5 5 5 5 ...
## $ nonmedical_exempt : num 106 161 3 3 3 3 2 2 2 2 ...
## $ exempt_MMR : int 73 101 8 8 8 8 7 7 7 7 ...
## $ percent_exempt : num 2.38 4.11 0.23 0.23 0.23 0.23 0.2 0.2 0.2 0.2 ...
## $ percent_medical_exempt: num 0.296 0.313 0.141 0.141 0.141 ...
## $ percent_nonmedexempt : num 2.0883 3.873 0.0847 0.0847 0.0847 ...
## $ percent_exempt_MMR : num 1.44 2.43 0.23 0.23 0.23 0.23 0.2 0.2 0.2 0.2 ...
## [1] "school_year" "school_code"
## [3] "school_name" "school_type"
## [5] "school_city" "school_state"
## [7] "school_zip" "school_county"
## [9] "school_district" "district_code"
## [11] "enrolled" "total_exemptions"
## [13] "medical_exempt" "nonmedical_exempt"
## [15] "exempt_MMR" "percent_exempt"
## [17] "percent_medical_exempt" "percent_nonmedexempt"
## [19] "percent_exempt_MMR"
## 'data.frame': 6696 obs. of 19 variables:
## $ school_year : Factor w/ 3 levels "2011","2012",..: 2 2 2 2 2 2 3 3 3 3 ...
## $ school_code : chr "3706" "2308" "2131" "2757" ...
## $ school_name : chr "Rose Hill Junior High" "Kirkland Junior High" "Wapato Middle School" "Satus Elementary" ...
## $ school_type : Factor w/ 2 levels "not_Public","Public": 2 2 2 2 2 2 2 2 2 2 ...
## $ school_city : chr "Redmond" "Kirkland" "Wapato" "Wapato" ...
## $ school_state : chr "WA" "WA" "WA" "WA" ...
## $ school_zip : chr "98052" "98033" "98951" "98951" ...
## $ school_county : chr "KING" "KING" "YAKIMA" "YAKIMA" ...
## $ school_district : chr "Lake Washington School District" "Lake Washington School District" "Wapato School District" "Wapato School District" ...
## $ district_code : num 17414 17414 39207 39207 39207 ...
## $ enrolled : num 5076 4157 3540 3540 3540 ...
## $ total_exemptions : int 121 171 8 8 8 8 7 7 7 7 ...
## $ medical_exempt : num 15 13 5 5 5 5 5 5 5 5 ...
## $ nonmedical_exempt : num 106 161 3 3 3 3 2 2 2 2 ...
## $ exempt_MMR : int 73 101 8 8 8 8 7 7 7 7 ...
## $ percent_exempt : num 2.38 4.11 0.23 0.23 0.23 0.23 0.2 0.2 0.2 0.2 ...
## $ percent_medical_exempt: num 0.296 0.313 0.141 0.141 0.141 ...
## $ percent_nonmedexempt : num 2.0883 3.873 0.0847 0.0847 0.0847 ...
## $ percent_exempt_MMR : num 1.44 2.43 0.23 0.23 0.23 0.23 0.2 0.2 0.2 0.2 ...
## school_year school_code school_name school_type
## 2011:1934 Length:6696 Length:6696 not_Public: 968
## 2012:2428 Class :character Class :character Public :5728
## 2013:2334 Mode :character Mode :character
##
##
##
## school_city school_state school_zip
## Length:6696 Length:6696 Length:6696
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## school_county school_district district_code enrolled
## Length:6696 Length:6696 Min. : 1109 Min. : 1
## Class :character Class :character 1st Qu.:17001 1st Qu.: 212
## Mode :character Mode :character Median :20400 Median : 425
## Mean :22236 Mean : 474
## 3rd Qu.:31016 3rd Qu.: 585
## Max. :39209 Max. :5076
## total_exemptions medical_exempt nonmedical_exempt exempt_MMR
## Min. : 0.0 Min. : 0.000 Min. : 0.00 Min. : 0.0
## 1st Qu.: 7.0 1st Qu.: 0.000 1st Qu.: 6.00 1st Qu.: 4.0
## Median : 19.0 Median : 2.000 Median : 17.00 Median : 11.0
## Mean : 25.5 Mean : 3.534 Mean : 22.28 Mean : 14.8
## 3rd Qu.: 35.0 3rd Qu.: 4.000 3rd Qu.: 31.00 3rd Qu.: 20.0
## Max. :436.0 Max. :214.000 Max. :330.00 Max. :269.0
## percent_exempt percent_medical_exempt percent_nonmedexempt
## Min. : 0.00 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 2.81 1st Qu.: 0.0000 1st Qu.: 2.373
## Median : 5.01 Median : 0.3883 Median : 4.357
## Mean : 6.67 Mean : 0.7957 Mean : 5.977
## 3rd Qu.: 7.97 3rd Qu.: 0.9009 3rd Qu.: 7.112
## Max. :100.00 Max. :33.3333 Max. :100.000
## percent_exempt_MMR
## Min. : 0.000
## 1st Qu.: 1.470
## Median : 2.840
## Mean : 4.208
## 3rd Qu.: 4.680
## Max. :100.000
This is a long tailed distribution, a feature exhibited by most distributions in this data set. Log transformations are performed to visualize the data.
The data covers all counties in WA state. Since the counties have large range of populations, it would be interesting to see the distribution of schools in each county.
As expected, there is a large distribution of schools for each county, with King and Pierce counties (counties with largest urban population centers) having the most schools.
Total exemptions summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 7.0 19.0 25.5 35.0 436.0
Total exemptions summary by school year
## vaccReported$school_year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 10.00 23.00 28.71 38.00 436.00
## --------------------------------------------------------
## vaccReported$school_year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 6.00 18.00 23.84 33.00 371.00
## --------------------------------------------------------
## vaccReported$school_year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 7.00 18.00 24.56 33.00 357.00
Summary of percent exempt
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 2.81 5.01 6.67 7.97 100.00
Summary of percent exempt by school year
## vaccReported$school_year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.182 5.320 6.761 8.198 60.000
## --------------------------------------------------------
## vaccReported$school_year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.690 4.855 6.642 7.912 100.000
## --------------------------------------------------------
## vaccReported$school_year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.700 4.875 6.623 7.920 100.000
Note: Higher exemption rates of ~ > 50% appear to be associated with lower levels of enrollment (< 100 students). This will be investigated in the bivariate section.
Summary of medicl exemptions
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 2.000 3.534 4.000 214.000
Summary of medical exemptions by school year
## vaccReported$school_year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 1.000 2.937 3.000 168.000
## --------------------------------------------------------
## vaccReported$school_year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 2.000 3.014 4.000 186.000
## --------------------------------------------------------
## vaccReported$school_year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 2.000 4.569 5.000 214.000
Summary of percent medical exempt
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.3883 0.7957 0.9009 33.3300
Summary of percent medical exempt by school year
## vaccReported$school_year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.3140 0.5583 0.6891 17.6800
## --------------------------------------------------------
## vaccReported$school_year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.3711 0.7112 0.8777 33.3300
## --------------------------------------------------------
## vaccReported$school_year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.5128 1.0800 1.0980 29.6300
Summary of non-medical exemptions
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 6.00 17.00 22.28 31.00 330.00
Summary of non-medical exemptions by school year
## vaccReported$school_year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 9.00 20.00 25.77 35.00 327.00
## --------------------------------------------------------
## vaccReported$school_year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 5.00 15.00 21.18 29.00 330.00
## --------------------------------------------------------
## vaccReported$school_year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 6.00 15.00 20.55 28.00 313.00
Summary of percent medical exemptions
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.373 4.357 5.977 7.112 100.000
Summary of percent medical exemptions by school year
## vaccReported$school_year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.828 4.785 6.202 7.524 60.000
## --------------------------------------------------------
## vaccReported$school_year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.272 4.324 6.080 7.147 100.000
## --------------------------------------------------------
## vaccReported$school_year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.212 4.037 5.684 6.629 100.000
Summary MMR exemptions
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 4.0 11.0 14.8 20.0 269.0
Summary MMR exemptions by school year
## vaccReported$school_year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 5.00 12.00 15.29 21.00 229.00
## --------------------------------------------------------
## vaccReported$school_year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 4.00 11.00 14.58 20.00 267.00
## --------------------------------------------------------
## vaccReported$school_year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 4.00 11.00 14.63 20.00 269.00
Summary percent MMR exemptions
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.470 2.840 4.208 4.680 100.000
Summary percent MMR exemptions by school year
## vaccReported$school_year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.440 2.720 3.840 4.547 54.950
## --------------------------------------------------------
## vaccReported$school_year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.518 2.920 4.509 4.812 100.000
## --------------------------------------------------------
## vaccReported$school_year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.480 2.845 4.200 4.640 100.000
Gather terms for exemption type (melt) the dataframe, the calculate the % of total exemptions that are of a particular type (medical or non-medical) While this is not a tidy data frame it is a convenient way to look at observations by exemption type. I am using this to develop a better understanding of the propotion of exemptions that are non-medical.
For purpsoses of this analysis I decided to subset the dataframe to enrollment of 10-2500. Previous enrollment disribution showed that most of the student populaton fell between 100 and 1000 students. However, there may still be valuable information for schools with lower or higher populations. There are very few schools with students beyond 2500. During my data wrangling I did a spot check of schools at the high population extremes and found that many of the schools at the fringe of the high population range were not accurately reported. It was impracticalto scour and eliminate them all.For lower population schools one to a few exemptions creates large fluctuations in percent exemption. By making this subset I capture most of the data, limit erroneous data, and mitigate large fluctuations in percent exemptions as a result of low population.
## 'data.frame': 13160 obs. of 20 variables:
## $ school_year : Factor w/ 3 levels "2011","2012",..: 2 2 2 3 2 2 1 2 1 2 ...
## $ school_code : chr "1456" "800R" "3808" "3808" ...
## $ school_name : chr "Tacoma Waldorf School" "Rising Tide School" "Waldron Island School" "Waldron Island School" ...
## $ school_type : Factor w/ 2 levels "not_Public","Public": 1 1 2 2 2 2 2 1 1 1 ...
## $ school_city : chr "Tacoma" "Olympia" "WALDRON ISLAND" "Waldron Island" ...
## $ school_state : chr "WA" "WA" "WA" "WA" ...
## $ school_zip : chr "98405" "98501" "98297" "98297" ...
## $ school_county : chr "PIERCE" "THURSTON" "SAN JUAN" "SAN JUAN" ...
## $ school_district : chr "Tacoma School District" "Olympia School District" "Orcas Island School District" "Orcas Island School District" ...
## $ district_code : num 27010 34111 28137 28137 2250 ...
## $ enrolled : num 24 14 12 14 10 82 25 100 91 22 ...
## $ total_exemptions : int 20 11 9 10 7 50 15 55 50 12 ...
## $ exempt_MMR : int 20 11 8 9 7 27 11 55 50 8 ...
## $ percent_exempt : num 83.3 78.6 75 71.4 70 ...
## $ percent_medical_exempt: num 0 0 0 0 10 ...
## $ percent_nonmedexempt : num 91.7 78.6 75 71.4 60 ...
## $ percent_exempt_MMR : num 83.3 78.6 66.7 64.3 70 ...
## $ exemption_type : Factor w/ 2 levels "medical_exempt",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ exemptions : num 0 0 0 0 1 2 0 0 0 0 ...
## $ percent_exemption_type: num 0 0 0 0 14.3 ...
## [1] "school_year" "school_code"
## [3] "school_name" "school_type"
## [5] "school_city" "school_state"
## [7] "school_zip" "school_county"
## [9] "school_district" "district_code"
## [11] "enrolled" "total_exemptions"
## [13] "exempt_MMR" "percent_exempt"
## [15] "percent_medical_exempt" "percent_nonmedexempt"
## [17] "percent_exempt_MMR" "exemption_type"
## [19] "exemptions" "percent_exemption_type"
## school_year school_code school_name school_type
## 2011:3834 Length:13160 Length:13160 not_Public: 1822
## 2012:4740 Class :character Class :character Public :11338
## 2013:4586 Mode :character Mode :character
##
##
##
##
## school_city school_state school_zip
## Length:13160 Length:13160 Length:13160
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## school_county school_district district_code enrolled
## Length:13160 Length:13160 Min. : 1109 Min. : 10.0
## Class :character Class :character 1st Qu.:17001 1st Qu.: 222.8
## Mode :character Mode :character Median :20094 Median : 427.0
## Mean :22202 Mean : 471.2
## 3rd Qu.:31016 3rd Qu.: 586.0
## Max. :39209 Max. :2480.0
##
## total_exemptions exempt_MMR percent_exempt percent_medical_exempt
## Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0.0000
## 1st Qu.: 8.00 1st Qu.: 5.00 1st Qu.: 2.897 1st Qu.: 0.0000
## Median : 20.00 Median : 11.00 Median : 5.050 Median : 0.4075
## Mean : 25.82 Mean : 14.98 Mean : 6.682 Mean : 0.8072
## 3rd Qu.: 35.00 3rd Qu.: 20.00 3rd Qu.: 7.980 3rd Qu.: 0.9117
## Max. :436.00 Max. :269.00 Max. :83.330 Max. :33.3333
##
## percent_nonmedexempt percent_exempt_MMR exemption_type
## Min. : 0.000 Min. : 0.000 medical_exempt :6580
## 1st Qu.: 2.447 1st Qu.: 1.540 nonmedical_exempt:6580
## Median : 4.396 Median : 2.870
## Mean : 5.979 Mean : 4.206
## 3rd Qu.: 7.121 3rd Qu.: 4.690
## Max. :91.667 Max. :100.000
##
## exemptions percent_exemption_type
## Min. : 0.00 Min. : 0
## 1st Qu.: 1.00 1st Qu.: 8
## Median : 5.00 Median : 50
## Mean : 13.07 Mean :Inf
## 3rd Qu.: 18.00 3rd Qu.: 92
## Max. :330.00 Max. :Inf
## NA's :496
Public and private schools have similiar distributions and rates of both types of exemptions, for all three school years. Public schools have a larger proportion of exempitons of both types.
My data set contains 6696 observations of 19 variables. There are 2 categorical variables: school_year: 3 levels [2011, 2012, 2013] school_type: 2 levels [public, not public]
Number of exemptions per school: total, by type (medical and non-medical), and MMR.
School enrollment and school type.
I created % of each total and each exemption using the number respective to the exemption and enrollment, for each schools.
I was suprised to see that two of the smaller counties, Ferry and San Juan had distributions shifted to higher percent of all exemptions than King County. The narrative in the Seattle media led me to believe that the highest exemption rates would be observed in King County and the Seattle area in general. I was also surprised to find the public and private schools have similiar distribuions, albeit different proportions, respective to exemption type.
This is the case on a proportion basis, but not for rate. However, the schools in these counties tend to be much smaller so the rates shift dramatically per exempt student. There was substantial tidying of the orignal data sets as per the github links in the introduction. I extracted a subset of the data consisting of only schools that reported exemptions for the three schools years. This elminated ~ 1200 observations but was necessary since non-reporting schools are missing all exemption values and typcially enrollment as well.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
For most of the data (enrollment up to ~1500) there is a linear relationship between total exemptions and number of students enrolled, this isn’t surprising.
It would be more intersting to look at the rate, since it is normalized for population.
## Warning: Removed 67 rows containing missing values (stat_summary).
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## Warning: Removed 67 rows containing missing values (stat_smooth).
## Warning: Removed 67 rows containing missing values (geom_point).
Mean percent exempt has high variance at low levels of enrollment.
This make sense, since a small change in exemptions represents a larger proportion of the population vs. schools with higher levels of enrollment.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Percent exemption both general and specifc is largely flat across enrollment.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
##
## Pearson's product-moment correlation
##
## data: vaccReported$total_exemptions and vaccReported$percent_exempt
## t = 29.4422, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3172190 0.3596349
## sample estimates:
## cor
## 0.338599
##
## Pearson's product-moment correlation
##
## data: vaccReported$total_exemptions and vaccReported$percent_medical_exempt
## t = 23.5303, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2541234 0.2983711
## sample estimates:
## cor
## 0.2763937
##
## Pearson's product-moment correlation
##
## data: vaccReported$total_exemptions and vaccReported$percent_nonmedexempt
## t = 24.2471, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2619730 0.3060127
## sample estimates:
## cor
## 0.2841427
##
## Pearson's product-moment correlation
##
## data: vaccReported$total_exemptions and vaccReported$percent_exempt_MMR
## t = 14.6517, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1529677 0.1993854
## sample estimates:
## cor
## 0.1762745
Weak positive correlation between total exemptions and both total and specific exemption types. Medical and non-medical exempt have ~ the same correlation to total exemptions.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
##
## Pearson's product-moment correlation
##
## data: vaccReported$enrolled and vaccReported$nonmedical_exempt
## t = 52.0693, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5196347 0.5537363
## sample estimates:
## cor
## 0.5369048
##
## Pearson's product-moment correlation
##
## data: vaccReported$enrolled and vaccReported$nonmedical_exempt
## t = 52.0693, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5196347 0.5537363
## sample estimates:
## cor
## 0.5369048
Strong correlation between number of students enrolled and both non-medical exempt and MMR exemptions.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
There is an interesting relationship between total exemptions and medical exemptions. The correlation is weak up to about 150 exemptions and then takes off, becoming negative at about 300 exemptions. Some of this is because there a very few schools beyond 150 exemptions, but note by the point size that this does not track strictly to school size (number of enrolled students).
There is a strong positive correlation between total exemptions and non-medical exemptions. The is also a strong positive correlation between total exemptions and MMR exemptions, but not as strong as non-medical exemptions.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
##
## Call:
## lm(formula = percent_nonmedexempt ~ percent_exempt, data = vaccReported)
##
## Coefficients:
## (Intercept) percent_exempt
## -0.2264 0.9300
##
## Call:
## lm(formula = percent_exempt_MMR ~ percent_exempt, data = vaccReported)
##
## Coefficients:
## (Intercept) percent_exempt
## -0.5163 0.7083
Showing code here for sake of clarity. Calculating predicted MMR rate and non-medical exempt rate using linear fit.
# function to solve for y in y = mx + b, for a given x
lm_y <- function(lin, x) {
m = coef(lin)[2]
b = coef(lin)[1]
return((m*x) + b)
}
lm_y(linear_MMR_rate,10)
## percent_exempt
## 6.566816
lm_y(linear_nonmed_rate,10)
## percent_exempt
## 9.074078
# function to solve for x in y = mx + b, for a given x
lm_x <- function(lin, y) {
m = coef(lin)[2]
b = coef(lin)[1]
return((y-b)/m)
}
#solve lm predicted total exemption rate for a 5% MMR exemption rate
lm_x(linear_MMR_rate,5)
## (Intercept)
## 7.787952
Medical exemption rates show a weak positive correlation to total exemption rates. Both non-medical and MMR exemptions show a strong positive correlation to total exemption rates.
Public shcools have a higher median and larger spread of total exemptions, non-medical exemptions, and MMR exemptions. The spread is very tight for medical exemptions, with comparable medians. Some of the variability in the public schools can be explained by the fact that public schools have a larger range of enrolled students. However, this doesn’t explain observations for medical exemptions.
This is interesting to me. The median level is about the same for public vs. private (not-public) but the spread is much greater or private. I’ve looked at analyses from other states that essentially show that not all private schools, even by type of private school (Waldorf, Montessori) are created equal in terms of vaccination attitudes (tendency towards personal/non-medical exemptions).
I don’t know if this is true of my data set, but there seems to be more variability in private vs. public schools. These plots suggest this is true of non-medical and MMR exemptions, but not medical exemptions.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Medical and non-medical exemption rates or MMR exemption rates do not have a strong relationship. Non-medical exemption rates and MMR exemption rates have strong positive relationship.
The median level is about the same for public vs. private (not-public) but the spread is much greater for private. I’ve looked at analyses from other states that essentially show that not all private schools, even by type of private school (Waldorf, Montessori) are created equal in terms of vaccinations attitudes (tendency towards personal/non-medical exemptions). I don’t know if this is true of my data set, but there seems to be more variability in private vs. public schools. Plots suggest this is true of non-medical and MMR exemptions, but not medical exemptions.
Public shcools have a higher median and larger spread of total exemptions, non-medical exemptions, and MMR exemptions. The spread is very tight for medical exemptions, with comparable medians. Some of the variability in the public schools can be explained by the fact that public schools have a larger range of enrolled students. However, this doesn’t explain observations for medical exemptions.
Technically some of the bivariate plots could be considred multivariate since rates are derivative of more than once variable. Below I try to understand whether relationships signifcantly as a function of year or school type. I also include some other visualizatons to help foster an understanding of the data.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Added in vertical lines at 5 and 10% which is an estimate of the safety threshold for MMR and total exemption rates, respectively (see intro for further detail). This is for purposes of illustration.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Group observations by county, and by county & school type (public, not public) for the most recent school year, 2013-2014. A data subset was used that includes only schools with enrollment of 10 - 2500 students for the year 2013. See previous explanations to the enrollment subset. The year subset was made because there was very little variation from year to year and 2013 is the most current data.
## [1] "school_county" "mean_percent_medical_exempt"
## [3] "median_percent_medical_exempt" "max_percent_medical_exempt"
## [5] "min_percent_medical_exempt" "mean_percent_exempt_MMR"
## [7] "median_percent_exempt_MMR" "max_percent_exempt_MMR"
## [9] "min_percent_exempt_MMR" "mean_percent_exempt"
## [11] "median_percent_exempt" "max_percent_exempt"
## [13] "min_percent_exempt" "mean_percent_nonmedexempt"
## [15] "median_percent_nonmedexempt" "max_percent_nonmedexempt"
## [17] "min_percent_nonmedexempt" "schools"
## [19] "students"
## school_county mean_percent_medical_exempt
## Length:39 Min. :0.0000
## Class :character 1st Qu.:0.3250
## Mode :character Median :0.5900
## Mean :0.8056
## 3rd Qu.:1.1700
## Max. :1.9900
## median_percent_medical_exempt max_percent_medical_exempt
## Min. :0.0000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.: 1.425
## Median :0.4000 Median : 4.000
## Mean :0.3887 Mean : 6.151
## 3rd Qu.:0.5750 3rd Qu.: 8.040
## Max. :2.1100 Max. :29.630
## min_percent_medical_exempt mean_percent_exempt_MMR
## Min. :0 Min. : 0.880
## 1st Qu.:0 1st Qu.: 3.025
## Median :0 Median : 4.420
## Mean :0 Mean : 5.170
## 3rd Qu.:0 3rd Qu.: 6.360
## Max. :0 Max. :14.680
## median_percent_exempt_MMR max_percent_exempt_MMR min_percent_exempt_MMR
## Min. : 0.730 Min. : 0.91 Min. :0.0000
## 1st Qu.: 2.285 1st Qu.:10.42 1st Qu.:0.0000
## Median : 3.300 Median :22.81 Median :0.0000
## Mean : 3.924 Mean :23.12 Mean :0.6167
## 3rd Qu.: 4.770 3rd Qu.:33.33 3rd Qu.:0.4000
## Max. :11.170 Max. :64.29 Max. :7.1400
## mean_percent_exempt median_percent_exempt max_percent_exempt
## Min. : 1.320 Min. : 1.060 Min. : 2.13
## 1st Qu.: 4.390 1st Qu.: 3.570 1st Qu.:15.02
## Median : 7.120 Median : 5.610 Median :30.77
## Mean : 7.723 Mean : 6.429 Mean :28.47
## 3rd Qu.: 8.695 3rd Qu.: 7.205 3rd Qu.:42.28
## Max. :22.830 Max. :23.080 Max. :71.43
## min_percent_exempt mean_percent_nonmedexempt median_percent_nonmedexempt
## Min. : 0.000 Min. : 1.040 Min. : 0.670
## 1st Qu.: 0.000 1st Qu.: 3.860 1st Qu.: 2.970
## Median : 0.000 Median : 6.440 Median : 5.020
## Mean : 1.415 Mean : 7.056 Mean : 5.869
## 3rd Qu.: 1.315 3rd Qu.: 7.730 3rd Qu.: 6.720
## Max. :14.000 Max. :22.180 Max. :23.080
## max_percent_nonmedexempt min_percent_nonmedexempt schools
## Min. : 2.13 Min. : 0.000 Min. : 1.00
## 1st Qu.:13.77 1st Qu.: 0.000 1st Qu.: 13.00
## Median :26.56 Median : 0.000 Median : 24.00
## Mean :26.81 Mean : 1.246 Mean : 58.79
## 3rd Qu.:37.98 3rd Qu.: 1.170 3rd Qu.: 53.50
## Max. :71.43 Max. :12.000 Max. :581.00
## students
## Min. : 28
## 1st Qu.: 2648
## Median : 8244
## Mean : 27302
## 3rd Qu.: 23350
## Max. :289034
## Classes 'tbl_df', 'tbl' and 'data.frame': 39 obs. of 19 variables:
## $ school_county : chr "KING" "PIERCE" "SNOHOMISH" "CLARK" ...
## $ mean_percent_medical_exempt : num 1.99 0.72 1.13 0.59 0.87 0.27 0.96 1.15 0.5 1.2 ...
## $ median_percent_medical_exempt: num 0.82 0.52 0.68 0.5 0.59 0.16 0.73 0.55 0.21 0.83 ...
## $ max_percent_medical_exempt : num 29.63 7.46 12.26 2.3 14.29 ...
## $ min_percent_medical_exempt : num 0 0 0 0 0 0 0 0 0 0 ...
## $ mean_percent_exempt_MMR : num 3.51 2.96 4.42 5.36 5.5 1.37 5.26 4.67 3.19 6.33 ...
## $ median_percent_exempt_MMR : num 2.48 2.23 3.3 4.57 3.57 0.76 3.91 3.55 2.26 4.38 ...
## $ max_percent_exempt_MMR : num 36.5 33.1 32.2 36.5 44.9 ...
## $ min_percent_exempt_MMR : num 0 0 0 0 0 0 0 0.62 0 0 ...
## $ mean_percent_exempt : num 6.77 4.42 6.96 7.27 8.15 2.13 7.55 7.44 4.1 9.69 ...
## $ median_percent_exempt : num 5.03 3.66 5.42 6.5 6.03 1.38 5.97 5.95 3.21 7.27 ...
## $ max_percent_exempt : num 43.9 33.8 42.7 44 44.3 ...
## $ min_percent_exempt : num 0 0 0 0 0 0 0 1.25 0 0 ...
## $ mean_percent_nonmedexempt : num 4.99 3.88 5.97 6.84 7.3 1.91 6.7 6.44 3.62 8.58 ...
## $ median_percent_nonmedexempt : num 3.6 2.97 4.66 6.14 5.35 1.14 4.88 5.13 2.95 6.55 ...
## $ max_percent_nonmedexempt : num 42.9 33.8 37.4 42.7 44.3 ...
## $ min_percent_nonmedexempt : num 0 0 0 0 0 0 0 0.42 0 0 ...
## $ schools : int 581 253 205 129 165 90 86 78 57 72 ...
## $ students : num 289034 132412 108104 78252 76901 ...
## [1] "school_county" "school_type"
## [3] "mean_percent_medical_exempt" "median_percent_medical_exempt"
## [5] "max_percent_medical_exempt" "min_percent_medical_exempt"
## [7] "mean_percent_exempt_MMR" "median_percent_exempt_MMR"
## [9] "max_percent_exempt_MMR" "min_percent_exempt_MMR"
## [11] "mean_percent_exempt" "median_percent_exempt"
## [13] "max_percent_exempt" "min_percent_exempt"
## [15] "mean_percent_nonmedexempt" "median_percent_nonmedexempt"
## [17] "max_percent_nonmedexempt" "min_percent_nonmedexempt"
## [19] "schools" "students"
## school_county school_type mean_percent_medical_exempt
## Length:69 not_Public:30 Min. :0.0000
## Class :character Public :39 1st Qu.:0.1900
## Mode :character Median :0.6300
## Mean :0.9884
## 3rd Qu.:1.2400
## Max. :7.1400
## median_percent_medical_exempt max_percent_medical_exempt
## Min. :0.0000 Min. : 0.00
## 1st Qu.:0.0000 1st Qu.: 1.14
## Median :0.1600 Median : 2.47
## Mean :0.4633 Mean : 4.76
## 3rd Qu.:0.5800 3rd Qu.: 7.27
## Max. :7.1400 Max. :29.63
## min_percent_medical_exempt mean_percent_exempt_MMR
## Min. :0.0000 Min. : 0.870
## 1st Qu.:0.0000 1st Qu.: 3.100
## Median :0.0000 Median : 4.930
## Mean :0.1571 Mean : 7.024
## 3rd Qu.:0.0000 3rd Qu.: 8.230
## Max. :7.1400 Max. :28.570
## median_percent_exempt_MMR max_percent_exempt_MMR min_percent_exempt_MMR
## Min. : 0.610 Min. : 0.91 Min. : 0.000
## 1st Qu.: 2.400 1st Qu.: 8.82 1st Qu.: 0.000
## Median : 3.830 Median :17.39 Median : 0.000
## Mean : 6.021 Mean :19.72 Mean : 2.618
## 3rd Qu.: 6.630 3rd Qu.:30.77 3rd Qu.: 1.550
## Max. :28.570 Max. :64.29 Max. :28.570
## mean_percent_exempt median_percent_exempt max_percent_exempt
## Min. : 1.320 Min. : 0.980 Min. : 2.13
## 1st Qu.: 5.280 1st Qu.: 4.090 1st Qu.:10.53
## Median : 7.140 Median : 5.970 Median :23.08
## Mean : 9.782 Mean : 8.779 Mean :24.25
## 3rd Qu.:11.150 3rd Qu.: 9.900 3rd Qu.:37.50
## Max. :35.710 Max. :35.710 Max. :71.43
## min_percent_exempt mean_percent_nonmedexempt median_percent_nonmedexempt
## Min. : 0.000 Min. : 1.040 Min. : 0.670
## 1st Qu.: 0.000 1st Qu.: 4.450 1st Qu.: 3.510
## Median : 0.910 Median : 6.720 Median : 5.260
## Mean : 3.905 Mean : 8.932 Mean : 7.956
## 3rd Qu.: 3.230 3rd Qu.:10.530 3rd Qu.: 9.380
## Max. :35.710 Max. :31.640 Max. :28.570
## max_percent_nonmedexempt min_percent_nonmedexempt schools
## Min. : 2.13 Min. : 0.000 Min. : 1.00
## 1st Qu.:10.34 1st Qu.: 0.000 1st Qu.: 2.00
## Median :21.05 Median : 0.450 Median : 13.00
## Mean :22.77 Mean : 3.622 Mean : 33.23
## 3rd Qu.:36.21 3rd Qu.: 3.180 3rd Qu.: 29.00
## Max. :71.43 Max. :28.570 Max. :449.00
## students
## Min. : 11
## 1st Qu.: 336
## Median : 2502
## Mean : 15431
## 3rd Qu.: 9944
## Max. :260309
## Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame': 69 obs. of 20 variables:
## $ school_county : chr "ADAMS" "ASOTIN" "ASOTIN" "BENTON" ...
## $ school_type : Factor w/ 2 levels "not_Public","Public": 2 2 1 2 1 2 1 2 1 2 ...
## $ mean_percent_medical_exempt : num 0.22 2.29 0 0.36 1.56 0.33 0 0.64 5.34 0.64 ...
## $ median_percent_medical_exempt: num 0.19 2.29 0 0.22 0 0.22 0 0.46 0.87 0.55 ...
## $ max_percent_medical_exempt : num 0.74 2.47 0 2.78 7.27 ...
## $ min_percent_medical_exempt : num 0 2.11 0 0 0 0 0 0 0 0 ...
## $ mean_percent_exempt_MMR : num 3.71 3.7 2.97 2.9 5.26 ...
## $ median_percent_exempt_MMR : num 1.31 3.7 2.97 2.23 4 ...
## $ max_percent_exempt_MMR : num 30.77 5.3 2.97 14.7 12.5 ...
## $ min_percent_exempt_MMR : num 0.22 2.11 2.97 0 0 0 1.4 0 1.55 0 ...
## $ mean_percent_exempt : num 5.57 5.59 4.95 3.77 6.45 ...
## $ median_percent_exempt : num 2.03 5.59 4.95 3.19 7.94 ...
## $ max_percent_exempt : num 38.46 6.36 4.95 21.09 12.5 ...
## $ min_percent_exempt : num 0.87 4.82 4.95 0 0 0.67 1.4 0 4.65 0.15 ...
## $ mean_percent_nonmedexempt : num 5.36 3.3 4.95 3.44 4.89 ...
## $ median_percent_nonmedexempt : num 1.79 3.3 4.95 2.88 4.8 ...
## $ max_percent_nonmedexempt : num 38.46 3.89 4.95 21.09 12.5 ...
## $ min_percent_nonmedexempt : num 0.43 2.71 4.95 0 0 0.45 1.4 0 4.65 0 ...
## $ schools : int 13 2 1 50 7 29 2 21 4 113 ...
## $ students : num 6040 615 101 33026 1133 ...
## - attr(*, "vars")=List of 1
## ..$ : symbol school_county
## - attr(*, "indices")=List of 39
## ..$ : int 0
## ..$ : int 1 2
## ..$ : int 3 4
## ..$ : int 5 6
## ..$ : int 7 8
## ..$ : int 9 10
## ..$ : int 11
## ..$ : int 12 13
## ..$ : int 14
## ..$ : int 15 16
## ..$ : int 17 18
## ..$ : int 19
## ..$ : int 20 21
## ..$ : int 22 23
## ..$ : int 24 25
## ..$ : int 26 27
## ..$ : int 28 29
## ..$ : int 30 31
## ..$ : int 32 33
## ..$ : int 34 35
## ..$ : int 36 37
## ..$ : int 38
## ..$ : int 39 40
## ..$ : int 41 42
## ..$ : int 43
## ..$ : int 44
## ..$ : int 45 46
## ..$ : int 47 48
## ..$ : int 49 50
## ..$ : int 51
## ..$ : int 52 53
## ..$ : int 54 55
## ..$ : int 56 57
## ..$ : int 58 59
## ..$ : int 60
## ..$ : int 61 62
## ..$ : int 63 64
## ..$ : int 65 66
## ..$ : int 67 68
## - attr(*, "group_sizes")= int 1 2 2 2 2 2 1 2 1 2 ...
## - attr(*, "biggest_group_size")= int 2
## - attr(*, "labels")='data.frame': 39 obs. of 1 variable:
## ..$ school_county: chr "ADAMS" "ASOTIN" "BENTON" "CHELAN" ...
## ..- attr(*, "vars")=List of 1
## .. ..$ : symbol school_county
## Warning in rm(by_county_exempt, by_county_exempt_type,
## by_county_medexempt, : object 'by' not found
Bar plots using the county_group and county_group_type data frames.
County level data for mean MMR and total exemption rates. The y axis is ordered by number of stuents enrolled for each county. I used position = dodge for side by side comparison and position = fill to show the proportion contribution of each school type to the mean rate.
It is very clear from these bar plots that there are many counties with mean total and MMR exemption rates exceeding the 10% and 5% thresholds, respectively. It is also clear that this does not cut on student population. There are several smaller counties with very high rates of exemption. However, the total exemptions are much smaller, than for larger counties. For example Ferry has only 678 students for the 2013-2014 school year, spread accross several schools.
County level data for mean MMR and total exemption rates, by school type. The y axis is ordered by number of students enrolled for each county.
more than total exemptions etc. etc. and keep plots as is
For both MMR and total exemption rates, private schools are comparable to or exceed public school exemption rates for many counties.
Melting the county_group dataframe for additional bar plot visualizatons.
Melt is achieved by using gather() [tidyr package], where mean_percent_medical_exempt and mean_percent_nonmedexempt are gathered into exemption type, with each individual mean captured in mean_percent_exemption_type.
## Observations: 39
## Variables:
## $ school_county (chr) "KING", "PIERCE", "SNOHOMISH", "...
## $ mean_percent_medical_exempt (dbl) 1.99, 0.72, 1.13, 0.59, 0.87, 0....
## $ median_percent_medical_exempt (dbl) 0.82, 0.52, 0.68, 0.50, 0.59, 0....
## $ max_percent_medical_exempt (dbl) 29.63, 7.46, 12.26, 2.30, 14.29,...
## $ min_percent_medical_exempt (dbl) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ mean_percent_exempt_MMR (dbl) 3.51, 2.96, 4.42, 5.36, 5.50, 1....
## $ median_percent_exempt_MMR (dbl) 2.48, 2.23, 3.30, 4.57, 3.57, 0....
## $ max_percent_exempt_MMR (dbl) 36.50, 33.10, 32.18, 36.48, 44.9...
## $ min_percent_exempt_MMR (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ mean_percent_exempt (dbl) 6.77, 4.42, 6.96, 7.27, 8.15, 2....
## $ median_percent_exempt (dbl) 5.03, 3.66, 5.42, 6.50, 6.03, 1....
## $ max_percent_exempt (dbl) 43.87, 33.79, 42.70, 43.97, 44.3...
## $ min_percent_exempt (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ mean_percent_nonmedexempt (dbl) 4.99, 3.88, 5.97, 6.84, 7.30, 1....
## $ median_percent_nonmedexempt (dbl) 3.60, 2.97, 4.66, 6.14, 5.35, 1....
## $ max_percent_nonmedexempt (dbl) 42.86, 33.79, 37.44, 42.67, 44.3...
## $ min_percent_nonmedexempt (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ schools (int) 581, 253, 205, 129, 165, 90, 86,...
## $ students (dbl) 289034, 132412, 108104, 78252, 7...
## Observations: 78
## Variables:
## $ school_county (chr) "KING", "PIERCE", "SNOHOMISH", "...
## $ median_percent_medical_exempt (dbl) 0.82, 0.52, 0.68, 0.50, 0.59, 0....
## $ max_percent_medical_exempt (dbl) 29.63, 7.46, 12.26, 2.30, 14.29,...
## $ min_percent_medical_exempt (dbl) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ mean_percent_exempt_MMR (dbl) 3.51, 2.96, 4.42, 5.36, 5.50, 1....
## $ median_percent_exempt_MMR (dbl) 2.48, 2.23, 3.30, 4.57, 3.57, 0....
## $ max_percent_exempt_MMR (dbl) 36.50, 33.10, 32.18, 36.48, 44.9...
## $ min_percent_exempt_MMR (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ mean_percent_exempt (dbl) 6.77, 4.42, 6.96, 7.27, 8.15, 2....
## $ median_percent_exempt (dbl) 5.03, 3.66, 5.42, 6.50, 6.03, 1....
## $ max_percent_exempt (dbl) 43.87, 33.79, 42.70, 43.97, 44.3...
## $ min_percent_exempt (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ median_percent_nonmedexempt (dbl) 3.60, 2.97, 4.66, 6.14, 5.35, 1....
## $ max_percent_nonmedexempt (dbl) 42.86, 33.79, 37.44, 42.67, 44.3...
## $ min_percent_nonmedexempt (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ schools (int) 581, 253, 205, 129, 165, 90, 86,...
## $ students (dbl) 289034, 132412, 108104, 78252, 7...
## $ exemption_type (fctr) mean_percent_medical_exempt, me...
## $ mean_percent_exemption_type (dbl) 1.99, 0.72, 1.13, 0.59, 0.87, 0....
Non-medical exemptions far exceed medical exemptions for all counties, contributing 70-100% of the total exemption rate, on a proportion basis.
Add a quartile field into melt_county_group, as a factor.
## [1] 2 2 2 3 3 1 3 3 1 4 1 3 1 1 1 1 2 4 4 2 3 2 1 2 2 3 4 4 4 1 4 2 4 4 2
## [36] 4 1 3 3 2 2 2 3 3 1 3 3 1 4 1 3 1 1 1 1 2 4 4 2 3 2 1 2 2 3 4 4 4 1 4
## [71] 2 4 4 2 4 1 3 3
## Levels: 1 2 3 4
## Observations: 78
## Variables:
## $ school_county (chr) "KING", "PIERCE", "SNOHOMISH", "...
## $ median_percent_medical_exempt (dbl) 0.82, 0.52, 0.68, 0.50, 0.59, 0....
## $ max_percent_medical_exempt (dbl) 29.63, 7.46, 12.26, 2.30, 14.29,...
## $ min_percent_medical_exempt (dbl) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ mean_percent_exempt_MMR (dbl) 3.51, 2.96, 4.42, 5.36, 5.50, 1....
## $ median_percent_exempt_MMR (dbl) 2.48, 2.23, 3.30, 4.57, 3.57, 0....
## $ max_percent_exempt_MMR (dbl) 36.50, 33.10, 32.18, 36.48, 44.9...
## $ min_percent_exempt_MMR (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ mean_percent_exempt (dbl) 6.77, 4.42, 6.96, 7.27, 8.15, 2....
## $ median_percent_exempt (dbl) 5.03, 3.66, 5.42, 6.50, 6.03, 1....
## $ max_percent_exempt (dbl) 43.87, 33.79, 42.70, 43.97, 44.3...
## $ min_percent_exempt (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ median_percent_nonmedexempt (dbl) 3.60, 2.97, 4.66, 6.14, 5.35, 1....
## $ max_percent_nonmedexempt (dbl) 42.86, 33.79, 37.44, 42.67, 44.3...
## $ min_percent_nonmedexempt (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ schools (int) 581, 253, 205, 129, 165, 90, 86,...
## $ students (dbl) 289034, 132412, 108104, 78252, 7...
## $ exemption_type (fctr) mean_percent_medical_exempt, me...
## $ mean_percent_exemption_type (dbl) 1.99, 0.72, 1.13, 0.59, 0.87, 0....
## $ mean_percent_exempt_quartile (int) 2, 2, 2, 3, 3, 1, 3, 3, 1, 4, 1,...
Below I use position = stack and add labels. My intent was to show the total exemption rate, the contribution from each type of exemption, and the number of students in the county. I then shade at 8% total exemption (corresponding to a roughly predicted 5% MMR exemption rate) to show danger zone. HOwever, it looks like stack is just “stacking”, rather than adjusting proportions. The plots also suffer from other short-comings. I’m leaving them here just to illustrate the code and hoping that coaches will have some suggestions for improvements.
I didn’t observe any relationships that stand out fromt the bivariate plots section.
There were not suprising interactions. However, it was intersting to see some of the highest mean exemption rates were in smaller counties. Previously I had noticed that there was a large variance in exemption rates on the smaller end of the enrollement spectrum, which is consitent with what I’m seeing at the county level.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
##
## Pearson's product-moment correlation
##
## data: vaccReported_NGT50_exemptrate$percent_exempt and vaccReported_NGT50_exemptrate$percent_medical_exempt
## t = 30.3715, df = 6681, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3270622 0.3691997
## sample estimates:
## cor
## 0.3483069
##
## Pearson's product-moment correlation
##
## data: vaccReported_NGT50_exemptrate$percent_exempt and vaccReported_NGT50_exemptrate$percent_nonmedexempt
## t = 292.7415, df = 6681, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9613855 0.9648553
## sample estimates:
## cor
## 0.9631604
##
## Pearson's product-moment correlation
##
## data: vaccReported_NGT50_exemptrate$percent_exempt and vaccReported_NGT50_exemptrate$percent_exempt_MMR
## t = 116.7002, df = 6681, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8110272 0.8268150
## sample estimates:
## cor
## 0.8190762
Specific rates (% medical, nonmedical, and MMR exemption) vs. total percent exempt for schools with <= 50% total exemption, with smoothing lines per school year. Medical exemption rates do not have strong correlation (Pearson’s ~.35) with total exemption rates whereas non-medical(Pearson’s ~.96) and MMR exemption rates (Pearson’s ~.82) do. The correlation of non-medical exemption rates with the total exemption rate is stronger than that of MMR rates. There is very little change in the smoothing lines and degree of correlation from year to year.
Mean percent exemption rate per county, ordered by student population (# of students enrolled), for the school year 2013 and schools with 10-2500 total students. The gradient color scale represents mean % exempt MMR for each county. Neither mean total or MMR exemption rates cut by student population.
Schools with higher total exemption rates tend to have higher MMR exemption rates, consistent with plot one.
Mean % exemption rate and proportion % exempt by exemption type for counties in quartile four (top 25% of mean exemption rates), along with the affected student populaation. A red shade is used to show the area at and beyond what is generally considered the threshold for herd immunity
6 counties for the top quartile are at or beyond the 10% herd immunity threshold, while 4 are below it. The vast majority of exemptions for each county are non-medicalexemptions, also known as personal belief exemptions.
It is important to note that herd immunity thresholds are based on specific assumptions which include opportunity for exposure, how contageous a disease is, and population mixing. These may not apply evenly for all schools. Measeles in highly contagious with a large opporutnity for exposure relative to say Hep B. Esitmates for measles put a safe level of exemption closer to 5%.
This has been an interesting project. My original intent was to use the zip code data and data from the US Census Bureau (economic and demographic data down to the tract level) and the WA state government (demographic data at the school level), to explore school vaccine exemptions and exemption rates. Cleaning and tidying the data set for this purpose was daunting, but enjoyable. As a step towards my original intent, I decided to start exploring the data to develop a better understanding of the set and how exemption rates varied by type, year, student population etc. What I found was that I had a rich data set all on its own. I was able to view the high level of variance in the data, particuarly associated with school size, as well as how types of exemptions and types of schools were contributing to exemption rates. I made some surprising findings:
and some less than surprising findings:
In general, the body of scientific evidence suggests that vaccine exemption rates below 5-10% (depending on disease attributes) puts the non-immune population (no natural immunity and low resitance and have not received vaccine or the vaccine was not effective) at risk. In WA state, the vast majority of exemptions are personal belief (non-medical) exemptions. At the county level there are 6 counties with mean exemption rates exceeding 10% with even more exceeding the 5% MMR threshold.
Now that I have a solid understanding of the underlying data I would like to do the following as next steps.
I also plan to revisit the data set containing other disease exemption data to see how each of those varies with total exemption rate and get a better idea of what diesease people are not being vaccinated for at specific geographical levels.
Udacity Forums and Coaches Lounge
R for Dummies
R Cookbook by Paul Teetor (O’Reilly)
Stack Overflow